Longitudinal Functional Models with Structured 

Penalties 



Madan G. Kundu 1 , Jaroslaw Harezlak 1 ' 2 '* and Timothy W. Randolph 3 

1 Department of Biostatistics 
Indiana University School of Medicine 
Indianapolis, IN 46202 

2 Department of Biostatistics 
Indiana University Fairbanks School of Public Health 
Indianapolis, IN 46202 

3 Biostatistics and Biomathematics Program 
Fred Hutchinson Cancer Research Center 
Seattle, WA 98109 

*harezlak@iupui . edu 

November 21, 2012 

Abstract 

Collection of functional data is becoming increasingly common including 
longitudinal observations in many studies. For example, we use magnetic 
resonance (MR) spectra collected over a period of time from late stage HIV 
patients. MR spectroscopy (MRS) produces a spectrum which is a mixture 
of metabolite spectra, instrument noise and baseline profile. Analysis of such 
data typically proceeds in two separate steps: feature extraction and regres- 
sion modeling. In contrast, a recently-proposed approach, called partially em- 
pirical eigenvectors for regression (PEER) |31j . for functional linear models 
incorporates a priori knowledge via a scientifically-informed penalty operator 
in the regression function estimation process. We extend the scope of PEER 
to the longitudinal setting with continuous outcomes and longitudinal func- 
tional covariates. The method presented in this paper: 1) takes into account 
external information; and 2) allows for a time-varying regression function. In 
the proposed approach, we express the time-varying regression function as 
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linear combination of several time-invariant component functions; the time 
dependence enters into the regression function through their coefficients. The 
estimation procedure is easy to implement due to its mixed model equivalence. 
We derive the precision and accuracy of the estimates and discuss their con- 
nection with the generalized singular value decomposition. Real MRS data 
and simulations are used to illustrate the concepts. 

Keywords: Functional data analysis, longitudinal data, mixed model, struc- 
tured penalty, generalized singular value decomposition. 

1 Introduction 

Technological advancements and increased availability of storage of large datasets 
have allowed for the collection of functional data as part of time-course or longitudi- 
nal studies. In the cross-sectional setting, there have been many proposed methods 
for estimating a regression function in a so-called functional linear model (fLM). This 
function is a functional (continuous) analogue of a vector of (discrete) regression 
coefficients; it connects the scalar response, y to a functional covariate, w = w(s). 
Although these models have recently been well studied, extensions to longitudinally- 
collected functions have not received much attention. A recently-proposed approach 
called longitudinal penalized functional regression (LPFR) approach extends cross- 
sectional fLM to a longitudinal setting by incorporating subject-specific random 
intercepts [15]. One of the key assumption in LPFR is that the regression func- 
tion remains constant over time. Due to this restrictive assumption, LPFR is not 
suited for situations in which the association between the functional predictors and 
scalar response may evolve over time. In this manuscript, we propose a technique 
that extends the analysis of functional linear models by relating scalar outcomes to 
functional predictors, both observed longitudinally, and allows the estimation of a 
time- varying regression function. 

The method fits into a generalized ridge regression framework by imposing a 
scientifically-informed quadratic penalty term into the estimation process. The re- 
sulting estimate is a function that is represented by a set of "partially empirical" 
eigenvectors that arise from a joint eigen-basis decomposition of the predictor func- 
tions and the penalty term; see [31] • The extension of this framework to the longi- 
tudinal setting has two major advantages: 1) the regression function is allowed to 
vary over time; and 2) external or a priori information about the structure of the 
regression function can be incorporated directly into the estimation process. We 
formulate the estimation procedure within a mixed-model framework making the 
method computationally efficient and easy to implement. 

Ramsay and Dalzell [2H] introduced the term functional data analysis (FDA) in 
the statistical literature. The cross-sectional fLM with scalar response can be stated 
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as follows (see e.g., [38J) 



E(y\W)=fi y + [ W(sh(s)ds 



Jn 



where fi y is the mean of y, Q denotes the domain of the predictor functions W(s), 
s£fl, and 7(5) is a square integrable function that models the linear relationship 
between the functional predictors and scalar response. We will assume that W(-) 
denotes a mean-centered function (i?[W(s)] = for almost all s G Q). As there 
is no unique 7Q that solves this equation, additional regularization or constraint 
is required. Typically, some form of smoothness is imposed on 7(-), one approach 
being to expand both regression function and predictor function W(-) in terms 
of a set of spline basis functions such as B-splines and then obtain the regularized 
estimate of 7(-) (30]. Another approach is to express the regression function 7(-) in 
terms of the orthonormal eigenf unctions of covariance ofW(-) using Karhunen-Loeve 
(K-L) basis expansion (see, e.g., [22])- A third approach is to combine the above 
two approaches, known as penalized functional regression (PFR) approach [14]. In 
PFR approach, a spline basis is used to represent the regression function and a basis 
of eigenfunctions from the set of predictors is used to represent each W(-). Another 
approach is to use wavelet basis, instead of eigenfunctions, to represent the predictor 
functions [24] . 

Here we adopt an approach by Randolph et al. [3T] which does not begin by 
explicitly projecting onto a pre-specified basis of functions. Instead, prior informa- 
tion about functional structure is incorporated into the estimation process by way 
of a penalty operator. "Partially empirical eigenvectors for regression" (PEER) ap- 
proach exploits the fact that, in the familiar framework of penalized least-squares 
regression, the estimate arises from a set of basis functions that is determined jointly 
by the covariance (empirical spatial structure) and the penalty (imposed structure). 
This naturally extends ridge regression (noninformative structure) and smoothing 
penalties such as a second- derivative penalty (smooth regression function assump- 
tion). In this paper, we extend the scope of the PEER approach to the longitudinal 
setting in a manner that allows the estimated regression function 7 = j(t, ■) to vary 
with time. 

The problem we address involves repeated observations from each of N subjects. 
At each observation time, t, we collect data on a scalar response variable, y, and 
a (idealized) predictor function, W(-). We are interested in longitudinal regression 
models of the following form: 



Here j(t, •) denotes the regression function at time t, x t is a vector of scalar-valued 
(non-functional) predictors. In a similar spirit to that of a linear mixed model with 
time- related slope for longitudinal data, we assume that j(t, •) can be decomposed 
into several time-invariant component functions; e.g., j(t, •) = 7o(-) + tji(-). 




(1) 
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The recently proposed LPFR approach assumes the regression function in ([TJ 
is independent of time and proceeds in three steps: uses a truncated set of K- 
L vectors to represent the predictor functions; expresses the regression function 
using a spline basis; and fits the longitudinal model using an equivalent mixed- 
model framework that incorporates subject-specific random effects. In contrast, we 
model the coefficient function j(t,-) as a time-dependent combination of several 
time-invariant component functions, {ld{-)}d=oi eacn °f which is estimated via a 
penalty operator that is informed by the structure of the data or a scientific question. 

Our work is motivated by a study in which magnetic resonance (MR) spectra 
have been collected longitudinally from late stage HIV patients |20j. The observed 
spectra from MR spectroscopy (MRS) are a mixture of pure metabolite spectra, 
baseline profile and instrument noise. 

Pure metabolite spectra provide information about the spatial structure of the 
observed spectra and consequently can be used to inform the regression function. 
Figure [I] shows the baseline MR spectra, W(-), from a brain region called basal 
ganglia for 114 subjects and also plots nine pure metabolite spectra. We are in- 
terested in studying the association of y with the metabolite concentrations. It is 
natural that the spatial structure (shape) of these metabolite spectra should inform 
the process of estimating the relationship between y and W(-); i.e., the subspace 
spanned by the functional structure of the pure metabolite spectra may be more in- 
formative than structures such as B-spline or cosine functions that are "external" to 
the problem. Therefore, the statistical methodology applied in this paper estimates 
7 focusing on a subspace that is spanned by a basis of generalized singular vectors 
that arise jointly from the predictors, W(-), and the metabolite spectra. 

The cross-sectional fLM with scalar response has been a focus of various inves- 
tigations [301 1121 EH El El El HI |32]. Some of these methods estimate regression 
functions in two steps. For example, principal components regression (PCR) esti- 
mates of the regression function are obtained first and then these PCR estimates are 
projected onto a B-spline basis [6] or vice- versa; i.e., PCR fitting is performed only 
after projection onto B-splines [32] ■ Extensions of fLM have been made towards 
generalized linear model with functional predictors [221 EH] and quadratic functional 
regression |38j . Another class of models, known as Functional Analysis of Vari- 
ance (FANOVA), decompose repetitively-observed functional predictors into several 
(fixed and random) groups and subject-specific component functions [21 [T8l \TU\ ITT]. 
However, it is important to distinguish our work from FANOVA methods which do 
not relate functional predictor (s) to the scalar response in longitudinal set up, the 
way we did it in this paper. To our knowledge, LPFR |15j is the only published 
method addressing regression estimation in the longitudinal functional predictor 
framework. 

Section [2] establishes notation and specifics on the longitudinal functional model 
considered in this paper. In the applications, we employ a decomposition-based 
penalty and this is review in Section [3~3 First, in Section 3.1 , the concept of a gen- 
eralized ridge estimate (or Tikhonov-Phillips estimate, [36], [25] ) is discussed while 
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Observed MR spectra 



Pure metabolite spectra 




Figure 1: Observed MR spectra from basal ganglia for 114 subjects collected at 
baseline (t = 0) are shown in the left panel and the 9 pure metabolite profiles in the 
right panel. 



Section 3.2 shows that under weak assumptions, our longitudinal generalized ridge 
estimate, along with its bias and precision, can be obtained in terms of generalized 
singular (GS) vectors. These estimates can be obtained as best linear unbiased pre- 
dictors (BLUP) through mixed model equivalence and this is discussed in Section 
4.1| Expressions for precision are derived in Section 4.2 



Numerical illustrations are provided in Section [5j In particular, the simulation in 
Section 5A_ compares LPFR with the method proposed in this paper. The simulation 
in subsection evaluates the influence of sample size and relative contribution 
of prior spatial information on the proposed method using a decomposition-based 
penalty. An application based on experimental data is also illustrated in section [6} 
The methods discussed in this paper have been implemented in the refund package 
[9] in R through the peer() and lpeer(). Throughout the presentation we consider 
a single functional predictor. However, the proposed method has a natural extension 
to settings with more than one functional predictor and this is discussed in the last 
section. 



2 Statistical Model 

We consider Q = [0, 1], a closed interval in IR and let W(-) denotes a random 
function in L 2 (Q). Let Wu{-) denotes a functional predictor from the i th subject (i = 
1, . . . , N) at the t th timepoint (t = ti, . . . , t n .). Technically, an observed predictor 



arises as a discretized sampling from an idealized function, and we will assume that 
each observed predictor is sampled equally at p locations, s\, . . . ,s p G [0,1], with 
sampling that is appropriately regular and dense enough to capture informative 
spatial structure, as seen, for instance, in the MRS data in Section [6] Let ?% := 
[wit(si), • • • , Wit(sp)] T be the px 1 vector of values sampled from the realized function 
Wu(-). Then, the observed data are of the form {yu',Xit', w^}, where yu is a scalar 
outcome, xu is a K x 1 column vector of measurements on K scalar predictors, 
and Wn is the sampled predictor from the i th subject at time t. Denoting the true 
regression function at time t by j(t, ■), the longitudinal functional regression outcome 
model of interest is 

y it = xj t f3+ [ W i t{s)j(t,s)ds + zJ t b i + e it (2) 
Jo 

where, e# ~ N(Q, of) and bi is the vector of r random effects pertaining to subject % 
and distributed as N(0, E&J. As usual we assume that z it is a subset of x it , e it and 
bi are independent, e it and e^/ are independent whenever i ^ i' or t ^ if or both, 
and bi and by are independent if i ^ i'. Here xj t (3 is the standard fixed effect from 
K univariate predictors, zj t bi is the standard random effect and f Q Wi t (s)j(t, s)ds 
is the subject/time specific functional effect. We assume that 7(t, •) G L 2 (n) } for all 

When the association between functional predictor and response changes over 
time, the regression function j(t, s) varies over both spatial and time domain. For 
example, j(t,s) may vary linearly with time, j(t, s) = 7o(s) + *7i(s), or quadrati- 
cally, j(t, s) = 7o(s) + t-yi(s) + t 2 72(s). This is in a spirit similar to a linear mixed 
effects model with linear or quadratic time slope; see e.g., [13]. In general, we as- 
sume that 7(t, s) can be decomposed into several time-invariant component functions 
7o(s), ■ ■ • ,7ij(«) as 

7 (t, s) = To (s) + /i(t)7i(s) + ■ ■ ■ + fD(th D {s) 

where, /i, . . . , /d are -D prescribed linearly independent functions of t and /d(0) = 
for all d; the time component t enters into j(t, s) through these terms. 

At t = 0, j(t, s) reduces to jo(s) and has the obvious interpretation of a baseline 
regression function pertaining to the sampling points s. When D = 0, j(t, s) = 70 (s) 
is independent of t, a situation considered in [15]. In general, each / may be any 
function of t with /(0) = 0, e.g., f(t) = t or t exp(t). We can rewrite the equation Q 
as 

= 4/3 + / W it (s){ lQ (s) + /i(t)7i(s) + • • • + /D(t)7z>(s)}d« + 4 fo i + £it 

In a PEER approach, the dependence of yu on Wjt is seen as a linear dependence 
on observations at p sampling points, Wu, spatial (functional) structure is imposed 
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directly into the estimation of 7^ = [jd(si), ■ ■ ■ ,7d(s p )] T , for d = 0, . . . , D. Com- 
bining all n, = J2i=i n i observations from the N subjects obtained across all time 
points, we express the model as 

y = X/3 + W-y + Zb + e. (3) 

Here, y = [y Ul , ■ ■ ■ , y Uni , • • • , yit N ,- ■ ■ , ym nN ] T is an n, x 1 vector of all responses, 
X = [xj ti , • • • , xj tni j • • • j x Tt N i " ' ■> x ~Nt nN V i s an n » x K design matrix pertaining to 
K univariate predictors, (3 be the associated coefficient vector, 7 = [jj , jj , ■ ■ ■ , 7J] 1 " 
is a (D + l)px 1 vector of functional coefficients, is the corresponding n.x(D + l)p 
design matrix. Further, b is the rN x 1 vector of random effects and Z is the 
corresponding n, x rN design matrix. The matrix W has the structure 









wj tl fi(h)wj tl ■ 


■■ fD(h)wJ ti 


w = 




Wi = 








W N _ 




w Z ni h{tn>l ni ■ 


■■ fD{t ni )wJ tn 



3 Estimation of Parameters with a Penalty 

The formal model in ^ is ill-posed and has no unique solution for 7. Common 
approaches to estimate a regression function in a fLM involve reducing dimension 
by projecting onto a subspace defined by a few K-L (empirical) eigenvectors or onto 
the span of a set of spline basis functions. Alternatively, our use of a generalized 
ridge penalty constrains the estimation of 7 in the spatial (or s) dimension without 
preliminary smoothing or explicit dimension reduction. The process encourages 
structure of a particular type via the choice of penalty operator. In the longitudinal 
(or t) dimension, 7 is more explicitly and severely constrained by the choice of 
h, ■ ■ ■ , Jd- 

3.1 Generalized Ridge Estimate 

The model of interest described in the previous section can be written as follows: 

y = XP + W~i + e* (4) 

where e* = Zb + e ~ N(0, V) and V = ZT,i,Z T + a\I. Further, assume L d be the 
penalty operator for r ) d and be the associated tuning parameter, V d — 0, . . . , D. 
Then the penalized estimates of ft and 7 can be obtained by minimizing 

\\y - X(5 - W^WU + X 2 \\jo\\lr Lo + ■■■ + \ 2 d \\id\\ 2 lILd (5) 

here we have used the notation ||a||^ = a T Ba, where B is a symmetric, positive 
definite matrix. A generalized ridge estimate of (3 and 7 based on minimizing the 
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above expression is obtained as (see, e.g., [31], p. 66): 



P ={C T V- 1 C + D)- 1 C T V- 1 y (6) 
7. 

where, C = [X W], D = blockdiag{0, L T L} and L = blockdiag{A I/o, • • • , ^dLd}- 



3.2 Connection with the GSVD 

A PEER estimate and its mean squared error (MSE) can be expressed in terms 
of generalized singular (GS) vectors obtained through generalized singular value 
decomposition (GSVD) for the cross-sectional fLM with single functional predictor 
and no random effect [31]. For background on the GSVD and its computation, see 
[371 EH EH EES] . Here, we derive a similar expression for the generalized ridge estimate 
7 of the regression function in ([6]). 

After some algebra, the generalized ridge estimate in ^ for 7 can be expressed 

as 

7 = -A 1 X T V- 1 y + A 2 W T V- 1 y 

where 

aJ = (x T v- l xy 1 x T v- l w[w T v~ 1 w + l t l - w T v- 1 x(x T v- 1 x)- 1 x T v- 1 w]- 

A 2 = W T V~ 1 W + L T L - W T V- l X(X T V- 1 X)- 1 X T V~ 1 W 

When X = (a situation without any scalar predictors) or X T V^ 1 W = the 
generalized ridge estimation of 7 can be put into a PEER estimation framework in 
terms of GS vectors, as discussed below. 

With X = or X T V~ 1 W = 0, the 7 reduces to [W T V~ l W + L T L]~ 1 W T V~ 1 y. 
Moreover, in this case generalized ridge estimate of f3 becomes [X T V~ 1 X]~ 1 X T V~ 1 y. 
Now, if we transform W := V~ l / 2 W and y := V~ 1 ^ 2 y, we can rewrite L as 

L = X blockdiag { L , 7^1, • • • , ~^-L D \ = X L S 
I Ao A J 

Here, L s can be interpreted as a scaled L where scaling is done for all the tuning 
parameters associated with the 'longitudinal' part of the regression function with 
respect to the 'baseline' tuning parameter. 

Set p = (D + l)p, let m denote the number of rows in L and set c = dim[Null(L)]. 
Further, assume that n, <m<p<m + n, and the rank of the (n» + m) x p matrix 
[W T (L S ) T ] T is p. The following describes the GSVD of the pair (W,L S ): there 
exist orthogonal matrices U and V, a nonsingular Q and diagonal matrices S and 
M such that [3T] 

W = USg- 1 S = [0 S] S = blockdiag{5i, I P - m } 

L s = VMQ- 1 M = [M 0] M = blockdiag{Jp_ n ., M x } 
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Submatrices Si and Mi have £ = n. + m — p diagonal entries ordered as 

0>£>£>'.">£>i where ' °i + fi = 1 > * = i>->< 

Here, the columns {g^.} of (/ are the GS vectors determined by the GSVD of the pair 
(W, L s ). Denote the columns of U and V by Uk and v k , respectively. Now, it can be 
shown that \W T V~ l W + L T L]~ 1 W T V~ 1 = \W T V~ l W + Ag(L 8 ) T L 8 ]- 1 W T y- 1 = 
£/(iS T 5 + AgA^-M) -1 ^ W T V~ 1 / 2 and consequently, 7 can be expressed as 

^ = g(S T S + \lM Y M)- l S T U T y= 2 f k 2 -u T k yg k + u kV9k 

k=p-n.+l ° k + Ao/ifc ° k k=p-c+l 

Further, the bias and variance can be expressed as 

Bias[j] = (/ - W*W)~i = g{s T s + x 2 M T M)- 1 {\ 2 M T M)g- 1 

fc=i s'fcfl'fc 7 + z^fc=p-n.+i ^|+Ag^i ^ 

Varpy] = W /# V r (W /# ) T = G{S T S + A^ T A1)- 1 «S T <S(«S T «S + A^ATM)" 1 ^ 

Ep—c <t| ~p , t 

fc=p-n.+l (al+\l^ly9k9k ' l^k=p~c+l 9k9k 

where, J¥ # = [W T V r ~ :L lV+Ir T Ir]~ 1 lV T V r ~ :l and g k denotes the jfeth column of Q~ T = 
{Q^ 1 ) 1 = (G T )~ 1 - Further, we can express bias as [W T V~ 1 W + L T L]~ 1 L T L^f which 
means 7 will be unbiased only when 7 e Null(L). 

For estimates obtained using this technique, the bias and variance can be ex- 
pressed in terms of generalized singular vectors, provided the assumption of X T V~ 1 W 
applies. In this case, one can show that j5 is simply the generalized least squares 
estimate from the linear model y = X/3 + e*, and 7 is the generalized ridge estimate 
from y = Wj+e* with penalty L. That is, (3 is estimated as if were not present, 
and 7 is estimated as if X/3 were not present. The PEER estimate discussed in this 
section can be thought of as an extension of the estimation discussed in [3T] in two 
ways: we allow for a general covariance matrix V (for y) and we also extend the 
penalty operator to apply across multiply-defined domains, L , . . . ,Ld- 



3.3 Decomposition based penalty 

Our goal is to estimate 7 imposing some presumed functional structure. In other 
words, the aim is to supplement (not to smooth or otherwise alter) the predictor 
function with knowledge about spatial structure in a mathematically tractable way. 
A common approach to incorporate spatial structure into the functional regression 
model is to use the strongest structure from the predictors by considering only first 
few K-L vectors [THl E] • However, we will incorporate spatial structure through an 
informed choice of penalty operator as proposed in |31j . 
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Let 7d = 7L d ,A d be the estimate obtained from the penalty operator Ld and 
tuning parameter \ d , for each d = 0, . . . , D. For example, Ld may denote I p (a ridge 
penalty) or a second-order derivative penalty (giving an estimate having continuous 
second derivative). Alternatively, with prior knowledge about potentially-relevant 
structure in a regression function, a decomposition-based targeted penalty can be 
defined in terms of a subspace defined by such structure [31]. To be precise, if it 
is appropriate to impose scientifically-informed constraints on the "signal" being 
estimated by 7, this prior may be implemented by encouraging the estimate to be 
in or near a subspace, Q C L 2 (Q). 

Returning to our notation that reflects functional predictors observed at p sam- 
pling points, we represent Q by the range of a p x J matrix Q whose columns are 
qi, . . . , qj. Consider the orthogonal projection Pq = QQ + onto the Range(Q), where 
Q + is Moore- Penrose inverse of Q. Then a decomposition-based penalty is defined 
as 

L Q = a P Q + ai(I - Pq) (7) 

for scalars «o and a\. To see how Lq works, let 7^ be any estimate of jd- When 
7d G Sp(Q), we have L Q ^ d = a 7d, but when ^ d £ Sp(Q), we have L Q ^ d = ai7d- The 
condition aci > «o imposes more penalty for 7^ ^ Pq compared to when 7^ e Pq. 
The weights a% and «o determine the relative strength of emphasizing Q in the 
estimation process. Note, in particular, that taking a\ = a results in a ridge 
estimate and that Lq is invertible, provided a.\ and ao are nonzero. Analytical 
properties of estimates from this family of penalties have been discussed in [31 J . 



4 Mixed model representation 

Estimates of (3 and 7 obtained by minimizing the expression in equation ^ cor- 
respond to a generalized ridge estimate. In this section we aim to construct an 
appropriate mixed model that minimizes the expression in equation (|5]). In general, 
the penalty, L, is not required to be invertible but for simplicity this will be as- 
sumed here. The mixed model approach provides an automatic selection of tuning 
parameters. REML-based estimation of the tuning parameters has been shown to 
perform as well as the other criteria and under certain conditions it is less variable 
than GCV-based estimation [321 • 



4.1 Estimation of parameters 

Using Henderson's (1950) justification [21], one can show that the model y = X(3 + 
W~i + e* where, e* ~ N(0, V) and 7 d ~ N(0, ±(L T d L d )- 1 ), for each d = 0, . . . , D, 

minimizes the expression in equation (p| to obtain the BLUP. Thus the generalized 
ridge estimate of /3 and 7 correspond to the BLUP from the following model: 

y = X/3 + W/*7* + e 
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where, W* = [W Z], 7* = [ 7 T b T ] T ~ N[0, £ 7 *] and e ~ N(0, a 2 J) with 

S 7 * = blockdiag{(L T L) _1 , £5} E& = blockdiaglS^, • • • , £5^} 

This representation allows us to estimate fixed and functional predictors simply 
by fitting a linear mixed model (e.g., using the lme() of the nlme package in R or 
PROC MIXED in SAS). 



4.2 Precision of Estimates 

Our ridge estimate is the BLUP from equivalent mixed model, hence the variance of 
the estimate depends on whether the parameters are random or fixed. Randomness 
of 7 is a device used to obtain the ridge estimate while e and b in our case are truly 
random. With this argumentation, it can be advocated that variance of estimates 
to be conditional on 7, but not on b [51] . BLUP of (3, 7 and b can be expressed as 

(amen]: 

= (X T V 1 - 1 xy 1 X T V 1 - 1 y 7 = {L T L)- x W T V{-\y-X$) b = H b Z' T V{\y-X(3) 

where, V\ — V + W (L T L)~ 1 W T . (3 is an unbiased estimator of (3, but 7 is not 
unbiased. It is trivial to see that Cov(y|7) = V. Thus, the variances of /3 and 7, 
conditional on 7, are: 

Cov(^| 7 ) = (x T v 1 ~ 1 xy 1 x T v 1 - 1 vv 1 ~ 1 x (x T v 1 ~ 1 xy 1 

Cov(7| 7 ) = AyVA^ A 7 = (L T L)~ 1 W T V^ 1 {Vi - X(X T \/f 1 X) T }Ff 1 

To obtain the unconditional variance, we need to replace V by V\ in the above ex- 
pressions but this is will overestimate the variance of the estimates. The expressions 
for predicted value of y and its variance are: 

y = X(3 + Wj + Zb Cov(y| 7 ) = AyVAy 

where, 

A y = [{Vx-wl 1 LW-z^ b z T y 1 x (x T v- 1 xy 1 x t v- 1 +wl t lw t +zi: b z T }v 1 - 

Let, T = [1 fi(t) ■ ■ ■ fdif)} ® Ik ■ Then the discretized version of regression 
function at time t is 7( t ) = [j(t, Sj), • • • , j(t, sk)] = Tj. Therefore, the estimate of 
7(i) is 7( t ) = T7 and the estimate of its variance is TCov(7|7)T T . 



5 Simulation 

We present two simulation studies to evaluate the properties of the proposed method 



which we refer to as "LongPEER". The first simulation study (Section 5.1) com 



pares the performance of the LongPEER method with the LPFR approach. In the 
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second simulation study, only the LongPEER method is considered. The purpose 
of the second simulation study is to evaluate the influence of sample size and the 
contribution of prior information about spatial structure (as determined by two dif- 
ferent tuning parameters «o and a.\ in equation [7| on the LongPEER estimate. In 
both the simulation studies, the simulated predictor functions resemble the MRS 
data. All results summarized in this section are based on 100 simulated samples. 

For each subject and visit, predictor functions were simulated independently. 
Predictor functions are "bumpy" curves with bumps at some prespecified locations. 
White noise was added to the predictor functions to account for the instrumental 
measurement noise. Bumpy regression functions were generated with bumps at some 
(but, not all) of the bump locations of the predictor function. For the simulation in 
Section [5\T| the regression function is assumed to be independent of time, whereas it 
varies with time in simulation in Section |5.2| For both the predictor and regression 
functions, 100 equispaced sampling points in [0,1] are used. 

For the decomposition based penalty, the matrix matrix is defined as follows: 
1) select the discretized functions qj,j = 1, . . . , J spanning the "non-penalized- 
subspace" and 2) compute Ld = QQ + , where Q = col[gi, . . . , qj]. Vectors qj are 
discretized functions defined to be zero except at one bump corresponding to a 
region in the simulated predictor functions. For ridge and second-order penalties, 
the matrices L d are defined as I and V 2 , respectively, where D 2 = [dij] is a square 
matrix with entries d^i = d iti+ 2 = 1, d iyi+ \ = —2 and dij = otherwise. 

Estimation error was summarized in terms of the mean squared error (MSE) of 
the estimated regression function defined as ||7 — 7|| 2 , where 7 denotes the estimate 
of 7. Further, MSE was decomposed into the trace of the variance and squared norm 
of bias. We also calculated sum of squares of prediction error (SSPE) as \\y — y\\ 2 , 



where y denotes the estimate of y. The estimates based on the proposed methods, 
including the LongPEER estimate, were obtained as BLUPs from the mixed model 
formulation described in Section 14.11 

5.1 Comparison with LPFR 

As previously stated, LPFR estimates a regression function that does not vary with 
time. Therefore, in the first set of simulations, we generated the outcomes using 
a time-invariant regression function (i.e., j(t,s) = 7o(s), for all t). The following 
model was used to generate the outcome data: 



where 
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The bumpy predictor functions were generated from the following equation 

he Hi 

+(61 + 0.9)exp 

where ci/,, c 2 h and a h are defined in Table [TJ {£ih, h E Hi}, {^h, h £ H 2 }, and £31 
were drawn independently from Uniform(0, 0.1). Also, /3q = 0.06, e# ~ N[0, (0.02) 2 ] 
and bi ~ iV[0, (0.05) 2 ]. 

Table 1: Values of Ci^, c 2 h, a h and for genereation of predictor and regression 
function in simulation studies in Sections 15.11 and 15.21 
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We applied both LPFR (using lpfr() available in the refund package in R [9]) 
and the LongPEER method with different forms of structured penalties to the simu- 
lated data. To obtain LPFR estimate, the dimension of both principal components 
for predictor function and truncated power series spline basis for the regression 
function were set to default values of 30. 

Table [2] displays the MSE and prediction error obtained for different estimates. 
The range of SSPE for the considered methods is very narrow (1.116 and 1.168) 
indicating that all methods perform well when the prediction metric is applied. 

The LongPEER estimate has much smaller MSE when compared to the other 
three estimates. Ridge estimate has less bias compared to the estimate obtained 
using second-order difference penalty. However, due to the larger variance, in over- 
all, the ridge estimate has larger MSE compared to the estimate obtained using 
second-order difference penalty. Both the bias and variance are largest for the LPFR 
estimate and consequently it has the largest MSE of the four estimates. 

Figure [2] displays the four estimates of the regression function. The LongPEER 
estimate is almost identical to the true regression function. Bump locations and 
magnitudes are very close to the true function and nearly as smooth as the true 
function. The estimated regression function obtained using ridge penalty is also 
quite close to the true function including the location and the magnitude of the 
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Table 2: Estimation and prediction errors obtained using Ridge penalty, second- 
derivative penalty (-D 2 ), LPFR and LongPEER based on 100 simulated datasets. 
The sample size is set at N=100 and number of observations at = 4. 
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Figure 3: SSPE and MSE for simulation study in Section 5.2 



bumps. However, the ridge estimate is more wiggly than the LongPEER estimate. 
The LPFR estimate and the estimate obtained using second-order difference penalty 
are oversmoothed in that the magnitude of the bumps is underestimated or missed 
altogether. The increased performance of the LongPEER estimate is due to its 
ability to exploit presumed structural information which is ignored by the other 
estimates. 



5.2 Simulation with time varying regression function 

In this simulation, the regression function varies as a parametric function of time. 
We are not aware of other longitudinal functional regression methods for estimating 
a time-varying regression function so we display on the performance of LongPEER. 
The primary goal is to assess the effects of sample size and the relative contribution 
of external information as determined by «o and ot\ in equation [7j on the regression 
function estimate. 

Without loss of generality, we set ao — 1 and vary a\ on an exponential scale. 
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Larger values of oc\ indicate greater emphasis of prior information on the estimation 
process. The model considered here is similar to that described in Section 5T with 
the exception that j(t, s) = jo(s)+t Ji(s). The function 7o(s) is defined in equation 
([9]) and 7i(s) is of the form 



7i( s ) = aih exp 



-2500 * 



h 



100 



where the value of h and a±h are listed in Table [T] and /3q = 0.06. Realizations of 



functional predictors were generated as described in section [5TT 

For each simulation, an appropriate a 2 was chosen to ensure that the squared 



multiple correlation coefficient R 2 = s 2 /[s 2 +a' 2 ] is 0.9. Here, s 2 = \ 



4 1 

4 Z-,i=l N-l ^ 

,N;t=l,-.. 



1 (Vu 
4} 



y t y 1 

denotes the average sample variance in the set {yu — en : i 
with y. t = £ Y%=i Hit- 

Results in terms of SSPE and MSE are displayed graphically in Figure [3j As 
the sample size increases, both SSPE and MSE of 70 and 71 decrease, providing 
empirical evidence that the estimates obtained using decomposition based penalty 
are consistent. For a given sample size, we observe a decrease in MSE as the value of 
«i increases up to 100. However, no further improvement is seen beyond a± = 100, 
suggesting that choice of ot\ has little or no influence the estimate provided it is 
sufficiently large. 

The estimated 70 and 71 are displayed in Figure]!} The estimated 7 (-) and 7i(-) 
are nearly identical to the simulated true functions, even for sample size iV = 100. 
Further inspection reveals a small estimation bias at the points where both 70 and 
71 have peaks, but this bias gradually disappeared as the sample size increased to 
N = 400. 



6 MRS study application 

We applied LongPEER to study the association of metabolite spectra obtained from 
basal ganglia on the global deficit score (GDS) in a longitudinal study of late stage 
HIV patients [20] ■ GDS is often used as a continuous measure of neurocognitive 
impairment (see, e.g., [7]) and a large GDS score is an indicator of high degree 
of impairment. The collected MRS spectra are composed of the combination of 
pure metabolite spectra, instrument noise and baseline profile. The spectra were 
sampled at n = 399 distinct sampling points each. We collected a total of n, = 306 
observations from iV = 114 subjects. The longitudinal observations for each subject 
were within 3 years from baseline. The number of observations per subject ranged 
from 1 to 5 with median 3. 

Information on spectra obtained from nine pure metabolites was available and 
hence we were able to use this to define a decomposition penalty Lq as in equa- 
tion (j7|) (with ot\ = 1000). Spectra of basal ganglia at baseline and pure metabolite 
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Figure 4: Regression function estimates for the simulation study described in Section 



5.2 Left panel shows the estimate of 7o(-) and right panel the estimate of 7i(-) 



spectra are displayed in Figure [Tj The pure metabolite spectra include spectra 
of Creatine (Cr), Glutamate (Glu), Glucose (Glc), Glycerophosphocholine (GPC), 
myoinositol (Ins), N-Acetylaspartate (NAA), N-Acetylaspartylglutamate (NAAG), 
scyllo-lnositol (Scyllo) and Taurine (Tau). It can be verified that at each locations 
of major bumps in basal ganglia spectra (left panel), there are bumps in at least 
one of pure metabolite profiles (right panel). We fitted the following model to data 



Vit = A) + Pi 



t+ / W it (s)j(t, s)ds + bi + e it , 



(9) 



where, j(t, s) = 7o(s) + t 71 (s) and y it and W it (-) are the GDS and basal ganglia 
spectrum for subject i at time t, respectively. We assume that e it ~ N(Q, af) and 6j 
is the subject-specific random intercept distributed as N(0, of). The estimates are 
obtained as the BLUP from the mixed model formulation described in Section 14.11 
Estimates of of and o\ were 0.08037 and 0.33115, respectively. Figure [H] displays 
the estimates of 7o(-) and 7i(-) with pointwise 95% confidence bands. To make the 
interpretation easier, we also include the selected pure metabolite spectra. These 
figures reveal that 7o(-) (the 'baseline' part of the regression function) is different 
from zero at the locations where at least one of the pure metabolites Cr, Glu, 
NAA, NAAG and Scyllo has a bump. Similarly, each non-zero part of 7i(-) (the 
'longitudinal' part of the regression function) coincides with bump locations of one 
or more pure metabolite profiles of Cr, Glu, NAA, GPC and Ins. 
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Figure 5: Estimates of the regression function (with 95 % point-wise confidence 
band) for the analysis described in Section [6j The top panel shows the estimated 
7o(") and bottom panel the estimate of 7i(-) • Scaled pure metabolite profiles are 
also shown on both plots. 
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Pointwise confidence interval for 7o(-) and 7i(-) contain the "zero" line over most 
of the intervals of interest. The estimated 70 is significant in the region s G (0.4, 0.5) 
and estimated 71 is significant in a wider region of s G (0.4, 0.6). To be precise, the 
peaks of the both %(■) and ji(-) are significant at the locations where the at least 
one of the pure metabolite profiles of NAA and Glu have bumps. 

The finding of the significant negative 'longitudinal' effect of NAA is worth notic- 
ing. This suggests that GDS increases as the NAA concentration decreases in basal 
ganglia. This finding is consistent with the studies where reduced concentration of 
NAA has been found to be associated with decrease in neuronal mass (HI 1231 135] . 

7 Discussion 

We have proposed a novel estimation method for longitudinal functional regression 
and derived some properties of the coefficient function estimate. Within this frame- 
work, the LongPEER method is the first to allow a coefficient function to vary with 
time. It extends the scope of generalized ridge regression to the realm of longitu- 
dinal data. The approach may be viewed as an extension of longitudinal mixed 
effects models, replacing scalar predictors by functional predictors. Advantages of 
this method include: 1) a framework that allows the regression function to vary 
over time; 2) the ability to incorporate structural information into the estimation 
process; and 3) easy implementation through the linear mixed model equivalence. 

The emphasis here is on a general statistical framework for incorporating sci- 
entific knowledge into the estimation process when both the scalar response and 
predictor functions are observed longitudinally. In absence of prior information, one 
may impose more vaguely-defined constraints — such as smoothness or re-weighted 
empirical subspaces — to estimate the coefficient function. An advantageous use of 
specific prior information is illustrated in the first simulation where smoothness 
constraints or a spline basis representations perform poorly. The second simula- 
tion shows a reduction in estimation error as sample size increases; this empirically 
suggests the consistency for such an estimate, provided the information about the 
regression function structure is correct. 

Solutions to the generalized ridge regression problem can be expressed in many 
forms. The linear mixed model equivalence provides an easy computational imple- 
mentation as well as an automatic choice of the tuning parameters using REML 
criterion. The GSVD provides algebraic insight and a convenient way to derive the 
bias and variance expressions of the estimates. Another natural way to obtain the 
regression function estimates is by using Bayesian equivalence |33j with the infor- 
mative priors quantifying the available scientific knowledge. 

One of the natural extensions of our work can incorporate multiple functional pre- 
dictors. For example, we can observe two functional predictors Wf(-) and Wj: 2 \-) 
with 7^(i, ■) and 7 < - 2 - ) (t, •) their associated coefficient functions, respectively. Fur- 
thermore, we can express j^(t, s) = % (s) + /i^(^)7i (s) + • — h /j (07ci ( s ) and 
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7 (2)( t) s ) = j( 2 \ s ) + ff)(t) + 7 f)( s ) + . . . + /< 2) (t) 7 < 2) (s). If and W/( 2 ) repre- 
sent design matrices for the two functional predictors, then we can estimate 7^(£, •) 
and 7^(£, •) by finding the BLUP estimate of 7W and from the mixed model, 
y = X(3 + W + iy ( 2 )^( 2 ) + + e. The simplified formula for bias and variance 



derived in Section 3.2 still holds with an additional assumption (W^) T V^ 1 W^ = 0. 

As presented here, the method addresses models having a continuous scalar out- 
come, but allowing for either binary and count responses is of interest. Indeed, an 
important problem that arises in MRS data is that of understanding the neurocog- 
nitive impairment status of HIV patients, defined as a binary variable, based on 
functional predictors collected over time. Extending our approach to these general 
settings is possible and currently being pursued. 
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